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Abstract 

We propose an extension of the concept of Expected Improvement 
criterion commonly used in Kriging based optimization. We extend it 
for more complex Kriging models, e.g. models using derivatives. The 
target field of application are CFD problems, where objective function 
are extremely expensive to evaluate, but the theory can be also used in 
other fields. 



1 INTRODUCTION 

Global optimization is a common task in advanced engineering. The ob- 
jective function can be very expensive to calculate or measure. In par- 
ticular this is the case in Computational Fluid Dynamics (CFD) where 
simulations are extremely expensive and time-consuming. At present, the 
CFD code can also generate the exact derivatives of the objective function 
so we can use them in our models. The long computation to evaluate the 
objective function and (as a rule) high dimension of the design space make 
the optimization process very time-consuming. 

Widely adopted strategy for such objective functions is to use response 
function methodology. It is based on constructing an approximation of 
the objective function based on some measurements and subsequently 
finding points of new measurements that enhance our knowledge about 
the location of optimum. 

One of the commonly used response functions models is the Kriging 
model mUliniE]- This statistical estimation model considers the objec- 
tive function to be a realization of a random field. We can construct a 
least square estimator. If we assume the field to be gaussian, the least 
square estimator is the Bayesian estimator. Conditional distribution of 
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the field with respect to the measurements (a posteriori) is also gaussian 
with known both mean and covariance. 

One of the methods to find a point for new measurement is the Ex- 
pected Improvement criterion [3]. It uses a Expected Improvement func- 
tion: 

where F is the a posteriori field and -Fmin is the minimum of estimator. 
The new point of measurement is chosen in the minimum of EI function. 

Many modifications and enhancements were considered for the Kriging 
model. Application of linear operators, e.g. derivatives, integrals and 
convolutions, are easy to incorporate in the model[4][5]. 

Each of these extensions of classic Kriging model is based on measuring 
something else then is returned as the response. For example we measure 
gradient and value of the function, but the response is only the function. 
The Expected Improvement states that we should measure the function 
in place where the minimum of response can be mostly improved. But for 
classic model the notion of the measured and the response functions are 
the same. 

The purpose of this paper is the investigation wether the concept of 
EI can be extended for enhanced Kriging models. 

2 RELATIVE EXPECTED IMPROVE- 
MENT 

2.1 Efficient Global Optimization 

Jones et al.[3] propose an Efficient Global Optimization (EGO) algorithm 
based on Kriging model and Expected Improvement. It consists of the 
following steps: 

1. Select a learning group xi, . . . ,a;„. Measure objective function / in 
these points fi = f{xi). 

2. Construct a Kriging approximation F based on measurements fi, . . . , f, 

3. Find the minimum of EI(3;) function for the approximation. 

4. Augment n and set Xn at the minimum of EI. 

5. Measure /„ = f{xn) and go back to [2] 

EI function can have many local minima (is highly multi-modal) and is 
potentially hard to minimize. The original paper proposed Branch and 
Bound Algorithm (BBA) to efficiently optimize the EI function. To use 
BBA authors had to establish upper and lower bonds on minimum of EI 
function over a region. It was fairly easy and was the main source of 
effectiveness of EGO. While proposing an extension of EI concept we also 
have to propose a suitable methods of it's optimization. 

2.2 Gaussian Kriging 

Kriging, is a statistical method of approximation a multi-dimensional 
function basing on values in a set of points. The Kriging estimator (ap- 
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proximation) can be interpreted as a least-square estimator, but also as 
a Bayes estimator. We will use the latter interpretation as in the original 
EI definition. 

Let us take an objective function / : — > R. For some probabilistic 
space (r,.7^, P), we consider a random gaussian field F on Q with the 

known mean ji and covariance K{x,y). Now we take a measurements of 
the objective at points 2:1, . . . ,Xn as fi = f{xi). The Bayes estimator of 
/is: 

Where | B) is conditional expected value of A with respect to B. 
This estimator at y will be called the response at y and the {xi, fi) pairs 
will be called measurements at Xi. 

Let us take an event M — {Vi F{xi) = fi} C F and a a posteriori 
probability space {M,J^m, P(- | M)). Em will stand for expected value in 
a posteriori. Field F considered on the M space is also a gaussian random 
field with known both mean hm and covariance Km- We will call this 
field, the a posteriori field. 

2.3 From EI to REI 

We would want to estimate how much the minimum of F we will be 
improved if will measure / at some point. Estimator F after the mea- 
surement in x can be writhen as Fx = ^m{F\F{x)). The best estimate 
of the effect would be Em infn F^. But computing it would be very time- 
consuming. The idea of Expected Improvement (EI) is to take 

El{x) = Em min{Fn,i„, F{x)} 

where Fmin is the actual minimum of approximation F. Expected Im- 
provement is in fact expected value of how response at x will improve the 
actual minimum of F. Of course the definition is equivalent to: 

EI(a;) = EMiDin{Fmin,'EM{F{x)\F{x))} 

This formulation has a natural extension. Let us define, for a set of points 

ri = {rii, . . . , rii}, a augmented estimator F,,(x) = Em (F{x) \ F{r]i), F{rii)). 
For another set of points ^ = {Ci, • • • , Cfc} wo can define: 

REI(C, 77) = Em min{fn,in, F^(Ci), • • • , i^^(a)} 

Our Relative Expected Improvement (REI) is the expected value of how 
much the response at C will improve the minimum of F if we measure at 
77. This definition implies REI({a;}, {a;}) — EI(a;). 
We can use also a more general version: 

REU(C, r,) = Em min{F^(Ci), . . . , F^{Ck)} 

The main advantage of REI function is that we can examine the re- 
sponse in a different region then the region of acceptable measurements. 
A simple example illustrates it very well: 
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Example 1 We 're searching for some mineral. We have to estimate the 
maximum mineral content in somebody's land before buying it. We cannot 
drill at his estate, but we can drill everywhere around it. 

In this example response and measurements are in a different regions, so 
we cannot use EI. If the estate is A and the surrounding ground is B, 
in order to find the best place to drill, we would have to search for the 
minimum of REI({a;}, {y}) for x £ A and y £ B. 



3 APPLICATION 

3.1 Populations of measurement points 

The first application of using REI instead of EI is when we want to find a 
collection of measurement points instead of a single point, e.g. when the 
objective function can be computed simultaneously at these points. It's 
a possibility of making the optimization process more parallel. 

J^X. I I J ^ .ii i tT T ill iiiiiiii I 'i ' I 



(a) (b) 

Figure 2: (a) One point of measurement. |(b)|Population of measurement points. 



Example 2 We have k processors to solve our CFD problem, each run- 
ning a separate flow case. 

This procedure could be, for example, to optimize REI({(^i, . . . , {(i, • . . , C»i})- 
The main advantage in using such an expression, over using some selection 
of EI minima, is that REI considers the correlation between these points. 
For example, if x and y are strongly correlated, we don't want to measure 
in both these points, because the value in x implies the value in y. 
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3.2 Input enhancements 

The other application field is enhancing the Kriging model, by some other 
accessible information tlian the values in points. 

Let us define a generalized point as a pair {x,P), where a- G f2 is a 
point, and P is a linear operator. We can say that f{x,P) = {Pf){x). 
The field F{x,P) is also gaussian with: 



where Px stands for applying P to K as a, function of the first coefficient. 

Now all the earlier definitions can be extended to generalized points. (In 
fact this enhancement can be done by enlarging f2 to x {Id, P,S,...}) 

Example 3 The CFD code is solving the main and the adjoint problem. 
We have both the value of our objective and its derivatives with respect to 
design parameters. We want to find the best place to measure these values. 

We can use f{x, = ■§§^{x) to interpret measuring the derivatives 
of / interpret as measuring at points (x, g§j)- In the example we have not 
only calculated the value at (x. Id), but also at {x, gf^)- If we have d design 
parameters (that is f2 C K.'') we have d + 1 measurements simultaneously. 
We can now optimize: 



We take rf + I points of response C, to maximize the effect of all the mea- 
sured derivatives. We could of course use EI. In that case we would select 
the next point as if we're measuring only the value. By using REI we're 
incorporating the derivative information not only in the model, but also 
in the selection process. The disadvantage of such an expression is that 
we search in f?''"'"^ which is d • (d + 2)-dimensional. 

3.3 Multi-effect response 

Next on our list is the multi-effect model. We can imagine that our mea- 
sured function is composed of several independent or dependent effects, 
while our objective function is only one of them. The simplest case is when 
we want to optimize objective which we measuring with an unknown error. 

Let us now say that F consists of several components F[x) = {Z{x), W{x),V{x), . . .). 
Same letters will stand for linear operators, such that F{x,Z) — Zix). 

Example 4 Suppose that we 're searching for mineral A, but our drilling 
equipment for measuring content of A, cannot distinguish it from another 
mineral B. We know on the other hand that the latter is distributed ran- 
domly and in small patches. 

Let Z be our objective function (mineral A content) and e a spatially- 
correlated error (mineral B content). We can measure only Z + e while we 
want to optimize Z. In this example we can optimize REI({(a:, Z)}, {{x, Z+ 



^{x,P) = {Pfi){x) 



K{x,P;y,S) = PxSyK{x,y) 
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e)}). Such a procedure will simultaneously take into consideration opti- 
mization of the objective and correction of the error. To fully understand 
why this example is important, we have to remember that drilling in the 
same place twice would give the same result. The error correction in our 
procedure will bear this in mind and will avoid duplication of measure- 
ments. 

This model would include results obtained from lower-quality numer- 
ical calculations. For an iterative algorithm (non-random), we can state 
a higher error bound and reduce the number of iterations. We cannot 
assume the error to be fully random, because starting from the same pa- 
rameters, the algorithm will give the same results. That's why a good 
Kriging model, would recognize the error to be a narrowly correlated ran- 
dom field £. 




(a) (b) 



Figure 3: (a) High and I (b) I low fidelity models 



Example 5 We have two CFD models. One accurate and the other ap- 
proximate, but very fast (high and low-fidelity models). We know also, that 
the low-fidelity model is "smoothe" with respect to the design parameters. 

Let Z be our objective function and W he & approximation of Z. In this 
example we can separately optimize: 

REI({(Ci,^),...,(a,^)},{(a;,^)}) 
REl{{i(^,Z),...,{a,Z)},{{x,W)}) 

and subsequently choose between these two points. Field W is strongly 
spatially-correlated ("smooth") and as such it's measurement can have 
wider effect than Z. We can also take in to consideration the cost of the 
computation and select a better improvement-to-cost ratio. 

3.4 Robust response 

The Icist field of application, that we will discus, is the robust response. 
If for instance after optimization, the optimal solution will be used to 
manufacture some objects, we can be sure that the object will be manu- 
factured within certain tolerance. In other words, if the selected point is 
X, the actual point will be a;-|-e. Our real objective function is the average 
performance of these x -\- e. 
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(a) (b) 



Figure 4: (a) Designed and |(b)| manufactured product 



Example 6 Suppose we can calculate the drag force of a car. Our factory, 
makes cars with some known accuracy. We want to find the car shape, 
that will give the lowest average drag when made in our factory. 

Let Z be our objective function and e - tlie manufacturing error. We 
can measure only Z(x) while we want to optimize V,Z(x + e). Let us 
say that e is a random variable (for instance A'^(0, E)), and let be it's 
probability density. Now E(/i(x + e)) — (^e * h)(x) = h{x, (jjn*)- In above 
example we can use: 

REI({(C>.*^)},{(r?,Z)}) 

The robust response stated as above, has a good physical interpretation. It 
is also fairly easy to use as long as we can effectively calculate convolution 
of (j>e and the covariance function. 

It's also good to look at this kind of robust response, as a penalty for 
the second derivative. If e ~ A'^(0, E), then: 

ij 

Of course such a penalty would also be a linear operator Psh — h + 
\ "^ij dx dx ^^'^ such can be used instead of (f)^*. This approach 
can be useful for convolutions that are expensive to calculate. 



4 OPTIMIZATION 

4.1 Upper bounds 

As Jones et al.[3] noted, EI function can be highly multi-modal and poten- 
tially hard to optimize. To use the branch and bound algorithm (BBA), 
we have to establish a good upper bounds on REI. 
We defined REI to be: 

REI(C, v) = Ea/ min{Fn, in, Fi-j 

where Fr^{x) = Vim {F{x) \ F (rji) , . . . , F (rii)) . It is clear that Fr, is a 
gaussian field (in fact with only I degrees of freedom). We can calculate 
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its mean and covariance depending on rj. In such a case we would wan't 
to establish upper bounds for an expresion: 

= Emin{7i, . . . , 7^} 

for some 7 ~ N{fi,T,). To bound such an expression, we can use re- 
cent extensions of comparison principle by Vitale[3- The comparison 
principle states that the $h,e is greater, the greater are E(7i — 7^)^ = 
Eii + Ejj — 2T,ij . To calculate the upper bound for REI, we can maximize 
these expressions over a region and then calculate the independent but 
differently distributed (IDD) gaussian variables dominating REI. Con- 
struction of such dominating IDD variables is discussed in Ross[5]. 

4.2 Exact calculation 

In the last iterations of BBA the IDD-based bounds will be insufficient. 
The main direction of further research will be to establish a good method 
of calculating an exact bound on *I'^,e • Actual algorithms in this field are 
based on Monte Carlo or quasi-Monte Carlo methods, for instance using 
results by Genz[J. 

5 CONCLUSIONS 

Relative Expected Improvement is proposed to extend the concept of EI 
for more complex Kriging models. It can help search for new points of 
measurements and for populations of such points. It can also help to use 
derivative information more efficiently. Further research is needed to find 
efficient implementation of this concept. 
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